Analyzing Deception Data with Hierarchical Bayesian Signal Detection Theoretic Models (Computational notebook supplement to TBD)

Authors
Affiliations

Matti Vuorre

Mircea Zloteanu

Kingston University London

Published

April 21, 2023

Preface

This document is a computational notebook supplement to TBD (Zloteanu & Vuorre, TBD). We introduce Signal Detection Theory (SDT) and show how to apply it in practice in the area of deception research. Moreover, we implement SDT models as Generalized Linear Mixed Models (GLMM) in a bayesian inferential framework in R. This document is a Quarto computational notebook. You can download the source code at TBD and reproduce the analyses in RStudio.

R environment

First, ensure that your current working directory contains both the analysis script and the data file. It is best to download the whole project directory from TBD. If you use RStudio, open up the relevant R Project (deception-sdt.Rproj) in RStudio.

Then, make sure that you have the required R packages installed. We use renv to ensure that you can install and use the exact same versions of the packages. To install those packages, execute renv::restore() in the R console (you only need to do this once).

You can now load the required packages (click “Code” to show code here, and in code blocks in the rest of the document).

We then set some options for the document output.

Code
theme_set(
  theme_linedraw() +
    theme(panel.grid = element_blank())
)
bayesplot::color_scheme_set(scheme = "brewer-Spectral")
options(digits = 3)

Next we set options for the bayesian model estimation procedures. We use as many cores as are available on the machine.

Code
options(
  mc.cores = parallel::detectCores(logical = FALSE)
)
dir.create("models", FALSE)

Preliminaries

Data

For this tutorial, we use a synthetic copy of data discussed in XYZ. Note that all categorical predictors are R factors, and the outcome is an integer or float (0s and 1s).

Code
d <- read_rds("data/dataset-synthetic.rds")
head(d)
Table 1: First six rows of the example data.
Participant Training Stimulus LieType Veracity Answer
029 Emotion 1-1 Experiential True 1
029 Emotion 2-2 Experiential False 0
029 Emotion 3-2 Experiential False 0
029 Emotion 4-1 Experiential True 1
029 Emotion 6-1 Experiential False 1
029 Emotion 7-1 Experiential False 0
  • Training is a between-subjects variable indicating which training group the participant was in,
  • LieType is a within-subjects variable; what type of lie was presented on the trial
  • Stimulus indicates which stimulus was presented
  • Veracity indicates the stimulus veracity (True or False)
  • Answer is the DV, as each answer participants give; truth=1, lie=0

Non-SDT analysis

We first analyse these data with correct / incorrect classifications. Typically, this amounts to aggregating each person’s data to proportions correct (in different conditions / groups), and then visualizing and modelling those. For this example, we ignore any groups and conditions.

First, we create a new variable encoding response accuracy.

Code
d <- d %>% 
  mutate(
    Accuracy = as.integer(as.integer(Veracity) - 1 == Answer)
  )
head(d)
Table 2: First six rows of the example data with response accuracy.
Participant Training Stimulus LieType Veracity Answer Accuracy
029 Emotion 1-1 Experiential True 1 1
029 Emotion 2-2 Experiential False 0 1
029 Emotion 3-2 Experiential False 0 1
029 Emotion 4-1 Experiential True 1 1
029 Emotion 6-1 Experiential False 1 0
029 Emotion 7-1 Experiential False 0 1

We then summarize the data to proportion correct for each person.

Code
d_accuracy_participant <- d %>% 
  summarise(Accuracy = mean(Accuracy), .by = Participant)

We can then display these proportions correct as a simple histogram.

Code
d_accuracy_participant %>%
  ggplot(aes(Accuracy)) +
  scale_x_continuous(
    "Percent correct",
    limits = c(0, 1),
    labels = percent
  ) +
  scale_y_continuous(
    "Count",
    expand = expansion(c(0, .05))
  ) +
  geom_histogram(
    color = "black",
    fill = "grey80"
  )

Figure 1: Histogram of proportions correct.

On average, a t-test shows that participants are at chance:

Code
t.test(d_accuracy_participant$Accuracy, mu = 0.5) %>% 
  parameters()
Table 3: Parameter estimate from t-test of proportions correct against chance.
Parameter Mean mu Difference CI CI_low CI_high t df_error p Method Alternative
d_accuracy_participant$Accuracy 0.506 0.5 0.006 0.95 0.487 0.524 0.607 105 0.545 One Sample t-test two.sided

However, this simple measure of accuracy can conflate any potential bias that participants might have (to e.g. prefer “true” over “false” responses) with their ability to tell True stimuli apart from False stimuli. Signal Detection Theory allows studying bias and ability to detect true/falsehoods separetely.

Signal Detection Theory

Introduction

At the heart of Signal Detection Theory is the idea that following an observation period during which a stimulus may or may not have been presented, an observer evaluates her internal degree of evidence for whether a stimulus was presented. Because of noise in the perceptual-cognitive apparatus, the latent evidence signals are not always simply absent (for observation periods, or generally trials, with no stimulus) or present (for trials with a stimulus), but instead assumed to fall on a continuum. She then has or develops some kind of inner criterion which this latent evidence must exceed for her to report “Yes, signal was present during the observation period”. On each trial, a random value from the latent evidence continuum is realized, and the participant’s decision task is simple: If this particular evidence value is greater than the criterion, the participant responds “Yes.”

SDT models are applied (sometimes routinely) in a wide range of research areas. They are very popular in, for example, perception and memory research. Depending on the task at hand and concept under study, the mysterious quantity “latent evidence” is often labelled differently. For instance, in memory research it is often called “memory strength” or “familiarity”.

In the current application, we are interested in judgments of veracity, given some stimulus that is either truthful or not. In other words, participants are presented on each trial with a stimulus that is either False or True, and their task is to decide whether they thought the stimulus was false or true by responding with either “false” or “true”. We might therefore, in this application, call this hypothesized latent quantity as, perhaps whimsically, truthiness.

There are then hypothesized to be two truthiness distributions: One for False stimuli, and one for True stimuli. SDT allows us to examine the degree to which False and True stimuli lead to different truthiness distributions.

If participants are completely unable to detect any difference between False and True stimuli, these two distributions should be identical. Alternatively, if participants somehow are sensitive to the veracity of the stimulus, False and True stimuli should lead to an internal truthiness distribution for True trials that has a greater average value than does the internal truthiness distribution following False trials.

In addition, to make a decision the participant needs some criterion to which each truthiness signal is compared to. SDT allows us to examine where participants set this criterion. For example, it may be of interest whether participants show a liberal criterion, and are likely to respond “true” to just about anything.

One of the great benefits of SDT is that it allows us to separate sensitivity and bias in examining task performance.

We visualize the basic SDT framework for deception tasks in Figure 2. This figure shows two internal truthiness distributions with some made-up parameter values. In red is the distribution for False trials: Sometimes the realized truthiness value on that trial is low, sometimes high, but importantly there is some variation. In blue is the truthiness distribution for True trials: Sometimes the sampled value is high, sometimes low, but in this example it has a greater mean than the distribution for False trials.

Code
sdt_draw <- function(d, crit) {
  tibble(
    d = c(0, d),
    Veracity = factor(d, labels = c("False", "True"))
  ) %>% 
    ggplot() +
    scale_y_continuous(
      "Density",
      expand = expansion(c(0, 0.005))
    ) +
    scale_x_continuous(
      "Truthiness"
    ) +
    scale_color_brewer(
      "Veracity",
      palette = "Set1",
      aesthetics = c("color")
    ) +
    geom_vline(
      xintercept = crit,
      linewidth = 0.5
    ) +
    stat_slab(
      aes(
        xdist = dist_normal(d), 
        color = Veracity,
        fill = after_scale(alpha(color, .25))
      ),
      p_limits = c(0.0001, 0.9999)
    ) +
    # Criterion
    annotate(
      geom = "curve",
      ncp = 9, curvature = -0.3,
      x = 3, xend = crit, y = 0.5, yend = 0.2,
      arrow = arrow(length = unit(0.5, "lines"), type = "closed", angle = 20)
    ) +
    annotate(
      geom = "text",
      x = 3, y = 0.53,
      label = "Criterion"
    ) +
    # Sensitivity
    annotate(
      geom = "segment",
      x = d, xend = 0.0, y = 0.901, yend = 0.901,
      arrow = arrow(ends = "both", length = unit(0.5, "lines"), type = "open", angle = 90)
    ) +
    annotate(
      geom = "curve",
      ncp = 9, curvature = 0.4, angle = 135,
      x = 3.3, xend = mean(c(0, d)), y = 0.901, yend = 0.901,
      arrow = arrow(length = unit(0.5, "lines"), type = "closed", angle = 20)
    ) +
    annotate(
      geom = "text",
      x = 3.3, y = 0.87,
      label = "Sensitivity"
    ) +
    theme(
      axis.text.y = element_blank(),
      axis.ticks.y = element_blank()
    )
}
sdt_draw(1, 0.3)

Figure 2: Illustration of the basic SDT model.

Although SDT-based analyses might then seem theory-laden, in fact this model is statistically widely used for binary responses with some binary predictor value.

Practice

In practice, the parameters of this basic SDT model can be calculated from two proportions. To calculate those proportions, we first classify each trial into four different categories: Participants might correctly respond “true” to True (“hit”) and “false” to False (“correct rejection”) stimuli, and incorrectly respond “true” to False (“false alarm”), and “false” to True (“miss”) stimuli. We now classify each trial in the example data to one of these four categories.

Code
sdt <- d %>% 
  mutate(
    Type = case_when(
      Veracity == "False" & Answer == 1 ~ "fa", # False alarm
      Veracity == "False" & Answer == 0 ~ "cr", # Correct rejection
      Veracity == "True" & Answer == 1 ~ "hit", # Hit
      Veracity == "True" & Answer == 0 ~ "miss", # Miss
    )
  )
head(sdt)
Table 4: Trials with SDT classification.
Participant Training Stimulus LieType Veracity Answer Accuracy Type
029 Emotion 1-1 Experiential True 1 1 hit
029 Emotion 2-2 Experiential False 0 1 cr
029 Emotion 3-2 Experiential False 0 1 cr
029 Emotion 4-1 Experiential True 1 1 hit
029 Emotion 6-1 Experiential False 1 0 fa
029 Emotion 7-1 Experiential False 0 1 cr

We leave it as an excercise to the reader to find each classification above in Figure 2.

To calculate the SDT parameters, we must then aggregate the data to per-participant counts of these four categories. For computations, we also pivot the data to a “wide” format.

Code
# Aggregate per participant
sdt <- sdt %>% 
  count(Participant, Type) %>% 
  pivot_wider(names_from = Type, values_from = n)
head(sdt)
Table 5: SDT classification counts.
Participant cr fa hit miss
001 7 13 10 10
002 6 14 12 8
003 10 10 9 11
004 8 12 10 10
005 4 16 13 7
006 11 9 13 7

The hit rate is then the proportion of “hits” over the sum of “hits” and “misses”. The false alarm rate is the proportion of “false alarms” over the sum of false alarms and correct rejections. Those rates are then “z-scored” to establish the latent continuum as a standard normal distribution. Sensitivity, often called d-prime, is then the distance between the distributions’ means. The response criterion is calculated as (negative) half of the sum of the z-scores. In practice, we calculate all these in R as follows:

Code
# Calculate measures
sdt <- sdt %>% 
  mutate(
    hr = hit / (hit + miss),
    far = fa / (fa + cr),
    zhr = qnorm(hr),
    zfa = qnorm(far),
    dprime = zhr - zfa,
    crit = -0.5 * (zhr + zfa)
  )
head(sdt)
Participant cr fa hit miss hr far zhr zfa dprime crit
001 7 13 10 10 0.50 0.65 0.000 0.385 -0.385 -0.193
002 6 14 12 8 0.60 0.70 0.253 0.524 -0.271 -0.389
003 10 10 9 11 0.45 0.50 -0.126 0.000 -0.126 0.063
004 8 12 10 10 0.50 0.60 0.000 0.253 -0.253 -0.127
005 4 16 13 7 0.65 0.80 0.385 0.842 -0.456 -0.613
006 11 9 13 7 0.65 0.45 0.385 -0.126 0.511 -0.130

We can then compute basic statistical tests on these person-specific parameters. For example, we can ask whether sensitivity (dprime) was greater than zero.

Code
t.test(sdt$dprime) %>% 
  parameters()
Results of a t-test of dprime against zero.
Parameter Mean mu Difference CI CI_low CI_high t df_error p Method Alternative
sdt$dprime 0.031 0 0.031 0.95 -0.074 0.137 0.589 105 0.557 One Sample t-test two.sided

And whether participants show a liberal bias (negative criterion) or a conservative one (positive criterion).

Code
t.test(sdt$crit) %>% 
  parameters()
Results of a t-test of criterion against zero.
Parameter Mean mu Difference CI CI_low CI_high t df_error p Method Alternative
sdt$crit -0.232 0 -0.232 0.95 -0.284 -0.179 -8.75 105 0 One Sample t-test two.sided

In these example data, sensitivity was not statistically significantly different from zero: On average, participants are unable to tell False from True stimuli, their sensitivity to truthiness was not significantly different from zero. On the other hand, the criterion was very low, indeed negative, so participants were generally liberal to answer “true”. We can use the aggregate parameters to redraw Figure 2 from above (Figure 3):

Code
sdt_draw(mean(sdt$dprime), mean(sdt$crit))

Figure 3: SDT model estimated from example data: The implied distributions of truthiness for False and True trials. As is seen, the distributions overlap completely and thus sensitivity is at zero. Criterion is below zero, so on average, ‘True’ responses are more likely; this is often called a liberal criterion.

We can also visualize the participants’ parameters. Note that there were very few trials per participant, thus many participants will have identical parameter estimates. We therefore reduce overplotting by jittering the points a little bit.

Code
sdt %>% 
  ggplot(aes(crit, dprime)) +
  geom_hline(yintercept = 0, lty = 2, linewidth = .25) +
  geom_vline(xintercept = 0, lty = 2, linewidth = .25) +
  labs(x = "Criterion", y = "Sensitivity") +
  geom_smooth(
    method = "lm",
    linewidth = .33,
    color = "black",
    alpha = .2
  ) +
  geom_point(
    shape = 1, 
    position = position_jitter(.01, .01)
  ) +
  stat_centroid(
    .fun = mean, col = "red",
    geom = "point", size = 2
  ) +
  theme(aspect.ratio = 1)

Figure 4: Scatterplot of participant-specific criterion and sensitivity parameters. Red point indicates the bivariate mean.

Relation with accuracy

Typically, although conceptually different and on different scales, proportion corrects are typically very similar to d-primes.

Code
pa <- left_join(
  sdt,
  d_accuracy_participant
) %>% 
  ggplot(aes(Accuracy, dprime)) +
  scale_x_continuous(
    "Percent correct",
    breaks = extended_breaks(5),
    labels = percent
  ) +
  scale_y_continuous(
    "Sensitivity",
    breaks = extended_breaks(7)
  ) +
  geom_hline(yintercept = 0, lty = 2, linewidth = .25) +
  geom_vline(xintercept = 0.5, lty = 2, linewidth = .25) +
  geom_smooth(
    method = "lm",
    linewidth = .33,
    color = "black",
    alpha = .2
  ) +
  geom_point(shape = 1, position = position_jitter(.005, .01)) +
  theme(aspect.ratio = 1)
pb <- last_plot() + 
  aes(y = crit) + 
  scale_y_continuous(
    "Criterion",
    breaks = extended_breaks(7)
  )
pa | pb

Figure 5: Scatterplot of participant-specific sensitivity and accuracy (proportion correct) parameters.

Now that we have gleaned some insight into the inner workings of the SDT framework, we turn to estimating SDT models as generalized linear mixed models (GLMMs).

Model 0

Above, we used traditional methods and formulas for calculating SDT parameters for each participant. We then used those person-specific estimates in a further statistical test. We can get equivalent average and participant-specific parameters more efficiently with a GLMM. We label this model with a zero to indicate that it is a “baseline” model: We do not yet include any parameters to estimate the effects of various experimental manipulations.

Coding of predictor variables

We first need to ensure the categorical predictors are used appropriately in the model. We “contrast-code” Veracity. This ensures the model intercept is “between” False and True trials, or their average. It then corresponds to 1 - criterion.

Code
contrasts(d$Veracity) <- c(-0.5, 0.5)
contrasts(d$Veracity)
      [,1]
False -0.5
True   0.5

Model specification

We then define the GLMM as a mixed effects formula in R’s extended formula syntax. Below, we regress Answer (0 = “False”, 1 = “True”) on an intercept (in R syntax 1 means an intercept because it adds a column of 1s in the design matrix) and the slope of Veracity (whether the stimulus was True or False). We also specify both as correlated random effects across Participants, and the intercepts in addition as random across Stimulus

Code
f0 <- Answer ~ 1 + Veracity + 
  (1 + Veracity | Participant) + 
  (1 | Stimulus)

With this syntax, the model is parameterized such that there will be two main parameters. The intercept describes the (negative) criterion: It is half the sum of the z-scores of the hit and false alarm rates, or the z-score of the “true” responses for the “average” stimulus. The slope of Veracity indicates sensitivity–the separation of the latent variables between False and True trials. This parameter is typically called d-prime in many SDT applications.

Prior distributions

Although it is optional, for illustrative purposes we define some vaguely informative prior distributions on the population-level parameters.

Code
p0 <- prior(normal(0, 1), class = b) +
  prior(student_t(3, 0, 1), class = sd) +
  prior(lkj(1), class = cor)

Sampling

With the model formula, data, and prior distribution, we can then sample from the model’s posterior distribution. In addition, it is critical here to use the Bernoulli outcome distribution with a probit link; this specifies that we assume normality of the latent variable. We also specify the “file” argument, because some models take a while to sample, with “file” we can load the model from a file instead in subsequent runs.

Code
m0 <- brm(
  formula = f0, 
  family = bernoulli(link = probit),
  data = d,
  prior = p0,
  file = "models/m0"
)

It is good practice to then check model convergence. For simple models there almost never are any issues. Nevertheless you should always at least use numerical model diagnostics to evaluate if the sampling has performed as intended. The most commonly used metric is called “Rhat”, and this number should be almost exactly 1.0 (within about 0.01 of 1.0). We will return to this below, when looking at the model’s numerical summary.

Another way to do this is to visually see whether the four chains of samples are well mixed:

Code
mcmc_plot(m0, type = "trace") +
  theme(legend.position = "bottom")

Figure 6: Caterpillar plots of the main parameters’ mcmc chains.

Similarly, we want to see evidence that the model is able to reproduce the observed data:

Code
pp_check(m0, type = "bars_grouped", group = "Veracity", ndraws = 100) + 
  scale_x_continuous(breaks = c(0, 1), labels = c("False", "True"))

Figure 7: Posterior predictive check.

Model summary

The default model summary is printed with summary(). Here we use an additional arugment to show the prior distributions used.

Code
summary(m0, prior = TRUE)
 Family: bernoulli 
  Links: mu = probit 
Formula: Answer ~ 1 + Veracity + (1 + Veracity | Participant) + (1 | Stimulus) 
   Data: d (Number of observations: 4240) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Priors: 
b ~ normal(0, 1)
Intercept ~ student_t(3, 0, 2.5)
L ~ lkj_corr_cholesky(1)
<lower=0> sd ~ student_t(3, 0, 1)

Group-Level Effects: 
~Participant (Number of levels: 106) 
                         Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
sd(Intercept)                0.18      0.03     0.12     0.24 1.00     1623
sd(Veracity1)                0.37      0.06     0.24     0.48 1.00     1721
cor(Intercept,Veracity1)    -0.14      0.23    -0.59     0.32 1.00      772
                         Tail_ESS
sd(Intercept)                1796
sd(Veracity1)                1526
cor(Intercept,Veracity1)     1234

~Stimulus (Number of levels: 21) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.67      0.12     0.48     0.95 1.01      831     1515

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.23      0.16    -0.08     0.55 1.03      235      507
Veracity1    -0.18      0.06    -0.30    -0.05 1.00     2407     2715

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

We first return to the “Rhat” number from above. Above, each population level effect has an Rhat value of ~1.0 and we are reassured of model convergence. If this was not the case, we would first try drawing more samples from the model (see ?brm).

After confirming that the model estimation algorithm has converged, and that the model doesn’t have glaring errors in reproducing the data, we can proceed to interpret the estimates. Instead of the above verbose output, we use parameters() to display a neat summary. The regression coefficients describe the criterion (-Intercept) and d-prime (slope):

Code
parameters(m0, centrality = "mean")
Table 6: Summary of the baseline model’s main parameters.
Parameter Mean CI CI_low CI_high pd Rhat ESS
b_Intercept 0.232 0.95 -0.078 0.554 0.937 1.03 217
b_Veracity1 -0.176 0.95 -0.300 -0.051 0.999 1.00 2392

The above numbers are the posterior means and 95%CIs. Those are calculated from the samples we drew from the posterior distribution using brm() above. Note that these summarise differ somewhat from the manually calculated basic SDT summaries above, because the model is different (random stimulus intercepts). Those samples can also be visualized:

Code
gather_draws(m0, b_Intercept, b_Veracity1) %>% 
  mutate(
    .variable = factor(
      .variable, 
      levels = c("b_Intercept", "b_Veracity1"), 
      labels = c("Criterion", "dprime")
    )
  ) %>% 
  # Negate intercept to criterion
  mutate(.value = if_else(.variable == "Criterion", -.value, .value)) %>% 
  ggplot(aes(.value, .variable)) +
  scale_x_continuous(
    "Parameter value",
    breaks = extended_breaks(7)
  ) +
  scale_y_discrete(
    "Parameter",
    expand = expansion(0.01)
  ) +
  geom_vline(xintercept = 0, linewidth = 0.25) +
  stat_halfeye(
    adjust = 1.5,
    slab_color = "black",
    slab_linewidth = 0.25,
    slab_fill = "grey80",
    normalize = "xy",
    height = 0.75
  )

Figure 8: Estimated parameters’ posterior distributions for Model 0.

Alternative parameterisation 1

The above parameterisation is the most straightforward and produces parameter estimates that directly reflect the (negative) criterion and d-prime. However, because the model is parameterized as a linear model with an intercept and a slope, the two Veracity conditions can end up having different priors even though that might not be intended.

It is then often useful to reparameterize the model such that the population level parameters indicate means of the latent variable for each level of veracity. That is, instead of estimating an intercept and a slope, we directly estimate two means. This ensures using the same prior on the two veracity conditions’ latent variable. Because we are treating Veracity as a factor in R, removing the intercepts with a 0 does the trick:

Code
f0a <- Answer ~ 0 + Veracity + 
  (0 + Veracity | Participant) + 
  (1 | Stimulus)
m0a <- brm(
  f0a,  
  family = bernoulli(link = probit),
  data = d,
  file = "models/m0a"
)
parameters(m0a, centrality = "mean")
Parameter Mean CI CI_low CI_high pd Rhat ESS
b_VeracityFalse 0.309 0.95 -0.012 0.600 0.972 1.01 400
b_VeracityTrue 0.132 0.95 -0.177 0.423 0.814 1.01 382

These parameters now reflect the model-predicted z-scores of the probabilities of answering “True” for False and True stimuli, respectively. We can convert them to SDT metrics as follows (note that we can here reverse the previous model’s intercept to directly give the criterion)

Code
as_draws_df(m0a, variable = "b_", regex = TRUE) %>% 
  mutate(
    dprime = b_VeracityTrue - b_VeracityFalse,
    criterion = -0.5 * (b_VeracityTrue + b_VeracityFalse)
  ) %>% 
  parameters(centrality = "mean")
Parameter Mean CI_low CI_high pd
b_VeracityFalse 0.309 -0.012 0.600 0.972
b_VeracityTrue 0.132 -0.177 0.423 0.814
dprime -0.178 -0.307 -0.049 0.997
criterion -0.221 -0.500 0.089 0.935

Model 1

We then proceed to a more complex model, and ask whether bias or sensitivity differ between the training groups. First, we ensure that “None” is the baseline level for Training.

Code
d <- d %>% 
  mutate(
    Training = fct_relevel(Training, "None")
  )

Then, we can simply enter Training as a main effect and interaction with Veracity.

Code
f1 <- Answer ~ 1 + Veracity * Training + 
  (1 + Veracity | Participant) + 
  (1 | Stimulus)
m1 <- brm(
  f1,  
  family = bernoulli(link = probit),
  data = d,
  file = "models/m1"
)

The default parameterization, and adding Training with dummy contrasts, leads to an intercept that is the negative criterion for Training = “None”. Differences in negative criterion between this baseline and the other two groups are the two Training main effects. The main effect of Veracity indicates dprime for the baseline group, and the two interactions are differences in dprime between the baseline and the two other training groups. We display summaries of the estimates below, while negating the intercepts to criteria

Code
gather_draws(m1, `b_.*`, regex = TRUE) %>%
  mutate(
    .value = if_else(
      str_detect(.variable, "Veracity"), 
      .value, 
      -.value
    )
  ) %>% 
  mean_qi()
.variable .value .lower .upper .width .point .interval
b_Intercept -0.212 -0.514 0.093 0.95 mean qi
b_TrainingBogus -0.038 -0.175 0.099 0.95 mean qi
b_TrainingEmotion -0.022 -0.160 0.112 0.95 mean qi
b_Veracity1 -0.312 -0.536 -0.098 0.95 mean qi
b_Veracity1:TrainingBogus 0.185 -0.081 0.464 0.95 mean qi
b_Veracity1:TrainingEmotion 0.184 -0.083 0.462 0.95 mean qi

We can then use functions in the emmeans package to calculate other quantities, such as converting differences and interactions to conditional means. However, we cannot here negate the intercepts to directly correspond to criteria.

Code
# Criteria
emm_m1_c1 <- emmeans(m1, ~Training) %>% 
  parameters(centrality = "mean")
# Differences in criteria
emm_m1_c2 <- emmeans(m1, ~Training) %>% 
  contrast("pairwise") %>% 
  parameters(centrality = "mean")

# Dprimes for three groups
emm_m1_d1 <- emmeans(m1, ~Veracity + Training) %>% 
  contrast("revpairwise", by = "Training") %>% 
  parameters(centrality = "mean")
# Differences between groups
emm_m1_d2 <- emmeans(m1, ~Veracity + Training) %>% 
  contrast(interaction = c("revpairwise", "pairwise")) %>% 
  parameters(centrality = "mean")

reduce(list(emm_m1_c1, emm_m1_c2, emm_m1_d1, emm_m1_d2), bind_rows) %>% 
  select(c(1:2, 4:5))
Table 7: Criteria conditional means and group differences (first six rows), and groups’ dprimes and differences therein (last six rows).
Parameter Mean CI_low CI_high
None 0.212 -0.093 0.514
Bogus 0.250 -0.056 0.543
Emotion 0.234 -0.063 0.533
None - Bogus -0.038 -0.175 0.099
None - Emotion -0.022 -0.160 0.112
Bogus - Emotion 0.016 -0.113 0.145
True - False, None -0.312 -0.536 -0.098
True - False, Bogus -0.127 -0.320 0.065
True - False, Emotion -0.128 -0.321 0.065
True - False, None - Bogus -0.185 -0.464 0.081
True - False, None - Emotion -0.184 -0.462 0.083
True - False, Bogus - Emotion 0.001 -0.247 0.252

Typically, we are more interested in visualizations than tables. We saved the quantities above in objects, which we visualize below:

Code
bind_rows(
  bind_rows(emm_m1_d1, emm_m1_d2) %>% 
  mutate(
    Parameter = str_remove(Parameter, "True - False, ") %>% 
      fct_inorder(),
    x = "dprime"
  ),
  bind_rows(emm_m1_c1, emm_m1_c2) %>% 
    mutate(
      Parameter = fct_inorder(Parameter),
      x = "Negative criterion"
    )
) %>% 
  mutate(
    t = if_else(str_detect(Parameter, " - "), "Differences", "Group means") %>% 
      fct_inorder()
  ) %>% 
  ggplot(aes(Parameter, Mean)) +
  labs(
    x = "Training group (or difference)",
    y = "Posterior mean and 95%CI"
  ) +
  geom_hline(yintercept = 0, linewidth = .2) +
  geom_pointrange(aes(ymin = CI_low, ymax = CI_high)) +
  facet_grid(
    x~t,
    scales = "free"
  )

Figure 9: Posterior means and 95%CIs of the criterion and dprime parameters from Model 1.

Alternative parameterization

As above, in the linear model parameterisation above, prior distributions on parameters might have unintended consequences. We therefore reparameterize the model to estimate means of the latent variable instead.

Code
f1a <- Answer ~ 0 + Veracity %in% Training + 
  (0 + Veracity | Participant) + 
  (1 | Stimulus)
m1a <- brm(
  f1a,  
  family = bernoulli(link = probit),
  data = d,
  file = "models/m1a"
)
Code
parameters(m1a, centrality = "mean")
Parameter Mean CI CI_low CI_high pd Rhat ESS
b_VeracityFalse:TrainingNone 0.375 0.95 0.039 0.712 0.986 1.01 553
b_VeracityTrue:TrainingNone 0.059 0.95 -0.266 0.393 0.639 1.01 579
b_VeracityFalse:TrainingBogus 0.316 0.95 -0.008 0.639 0.971 1.01 572
b_VeracityTrue:TrainingBogus 0.191 0.95 -0.128 0.513 0.882 1.01 567
b_VeracityFalse:TrainingEmotion 0.297 0.95 -0.015 0.625 0.967 1.01 601
b_VeracityTrue:TrainingEmotion 0.173 0.95 -0.140 0.497 0.859 1.01 539

Calculating the SDT parameters then takes some more work, however. We need to manually calculate them for each group.

Code
x <- gather_draws(m1a, `b_.*`, regex = TRUE) %>% 
  separate(.variable, c("Veracity", "Training"), sep = ":") %>% 
  mutate(
    Veracity = str_remove(Veracity, "b_Veracity"),
    Training = str_remove(Training, "Training")
  ) %>% 
  pivot_wider(names_from = Veracity, values_from = .value) %>% 
  mutate(
    Criterion = -0.5 * (True + False),
    dprime = True - False
  ) %>% 
  select(-False, -True) %>% 
  pivot_longer(Criterion:dprime)
x2 <- x %>% 
  group_by(name) %>%
  compare_levels(value, Training)
Code
bind_rows(x, x2) %>% 
  mutate(
    t = if_else(str_detect(Training, "-"), "Difference", "Mean") %>% 
      fct_inorder(),
    Training = fct_inorder(Training)
  ) %>% 
  ggplot(aes(Training, value)) +
  labs(
    x = "Training group (or difference)",
    y = "Posterior mean and 95%CI"
  ) +
  geom_hline(yintercept = 0, linewidth = 0.25) +
  stat_halfeye(
    normalize = "xy",
    width = 0.5,
    slab_color = "black",
    slab_linewidth = 0.25,
    slab_fill = "grey80"
  ) +
  facet_grid(
    name~t, 
    scales = "free"
  )

Figure 10: Posterior distributions with 66% (thick) and 95%CIs (thin lines) of the criterion and dprime parameters from Model 1.

Region of practical equivalence

Because our estimates are random samples from the posterior distribution, it is easy to calculate further inferential quantities. For example, we might ask whether the estimated criteria, dprimes, or their differences were practically equivalent to zero. To do so, we establish arbitrarily that the region of practical equivalence to zero is from -0.1 to 0.1. We then calculate percentages of each posterior distribution that lie within this ROPE.

Code
rope <- bind_rows(x, x2) %>% 
  mutate(
    t = if_else(str_detect(Training, "-"), "Difference", "Mean") %>% 
      fct_inorder(),
    Training = fct_inorder(Training)
  ) %>% 
  group_by(Training, name, t) %>% 
  group_modify(
    ~describe_posterior(
      .$value, 
      test = "rope", 
      rope_ci = 1, 
      rope_range = c(-0.1, 0.1)
    )
  ) %>% 
  select(1:3, ROPE_Percentage)
Code
bind_rows(x, x2) %>% 
  mutate(
    t = if_else(str_detect(Training, "-"), "Difference", "Mean") %>% 
      fct_inorder(),
    Training = fct_inorder(Training)
  ) %>% 
  ggplot(aes(Training, value)) +
  labs(
    x = "Training group (or difference)",
    y = "Posterior mean and 95%CI"
  ) +
  geom_hline(yintercept = 0, linewidth = 0.25) +
  scale_fill_manual(
    "In ROPE",
    values = c(alpha("dodgerblue1", .5), alpha("dodgerblue4", .75)),
    aesthetics = c("slab_fill")
  ) +
  stat_halfeye(
    normalize = "xy",
    width = 0.5,
    slab_color = "black",
    slab_linewidth = 0.25,
    aes(slab_fill = after_stat(between(y, -0.1, 0.1)))
  ) +
  geom_text(
    data = rope,
    aes(label = percent(ROPE_Percentage, 1)),
    y = 0.05, nudge_x = -0.2
  ) +
  facet_grid(
    name~t, 
    scales = "free"
  )

Figure 11: Posterior distributions with 66% (thick) and 95%CIs (thin lines) of the criterion and dprime parameters from Model 1. The densities are colored according to whether the value lies within (dark) or outside the ROPE from -0.1 to 0.1. Proportions of the posterior distribution within the ROPE are displayed in numbers.

Model 2

Finally, we include the within-person manipulation LieType as well.

Code
f2 <- Answer ~ 1 + Veracity * Training * LieType + 
  (1 + Veracity * LieType | Participant) + 
  (1 | Stimulus)
m2 <- brm(
  f2,  
  family = bernoulli(link = probit),
  data = d,
  file = "models/m2"
)
Code
parameters(m2, centrality = "mean")
Parameter Mean CI CI_low CI_high pd Rhat ESS
b_Intercept 0.151 0.95 -1.226 1.531 0.589 1 2084
b_Veracity1 -0.293 0.95 -0.521 -0.058 0.992 1 2133
b_TrainingBogus 0.047 0.95 -0.116 0.212 0.720 1 2435
b_TrainingEmotion 0.154 0.95 -0.005 0.315 0.972 1 2292
b_LieTypeExperiential 0.118 0.95 -1.291 1.547 0.568 1 1978
b_Veracity1:TrainingBogus 0.100 0.95 -0.219 0.415 0.741 1 2674
b_Veracity1:TrainingEmotion 0.186 0.95 -0.127 0.489 0.881 1 2253
b_Veracity1:LieTypeExperiential 0.399 0.95 -0.334 1.148 0.862 1 964
b_TrainingBogus:LieTypeExperiential -0.019 0.95 -0.233 0.194 0.573 1 3477
b_TrainingEmotion:LieTypeExperiential -0.290 0.95 -0.498 -0.076 0.997 1 3082
b_Veracity1:TrainingBogus:LieTypeExperiential 0.202 0.95 -0.270 0.668 0.803 1 3041
b_Veracity1:TrainingEmotion:LieTypeExperiential 0.013 0.95 -0.445 0.479 0.519 1 2590

From these estimates, we can again use emmeans to produce other inferential quantities, like differences:

Code
emm_m2_d1 <- emmeans(m2, ~Veracity | Training * LieType) %>% 
  contrast("revpairwise") %>% 
  parameters(centrality = "mean")
emm_m2_d2 <- emmeans(m2, ~Veracity + Training * LieType) %>% 
  contrast(interaction = c("revpairwise", "pairwise"), by = "LieType") %>% 
  parameters(centrality = "mean")

# Criteria
emm_m2_c1 <- emmeans(m2, ~Training * LieType) %>% 
  parameters(centrality = "mean")
emm_m2_c2 <- emmeans(m2, ~Training | LieType) %>% 
  contrast("pairwise") %>% 
  parameters(centrality = "mean")
Code
# Dprimes
list(emm_m2_c1, emm_m2_c2, emm_m2_d1, emm_m2_d2) %>% 
  map(~tibble(.)) %>% 
  bind_rows() %>% 
  mutate(
    p = if_else(str_detect(Parameter, "True - False"), "dprime", "Criterion"),
    Parameter = str_remove(Parameter, "True - False, ")
  ) %>% 
  separate(Parameter, c("Training", "LieType"), sep = ", ") %>% 
  mutate(
    t = if_else(str_detect(Training, " - "), "Differences", "Group means") %>% 
      fct_inorder(),
    Training = fct_inorder(Training)
  ) %>% 
  ggplot(aes(Training, Mean, col = LieType)) +
  labs(
    x = "Training group (or difference)",
    y = "Posterior mean and 95%CI"
  ) +
  geom_hline(yintercept = 0, linewidth = .25) +
  geom_pointrange(
    aes(ymin = CI_low, ymax = CI_high),
    position = position_dodge(width = .25)
    ) +
  facet_grid(p~t, scales = "free")

Figure 12: Posterior means and 95%CIs of the criterion and dprime parameters, or differences therein, from Model 2.

Alternative parameterization

Code
f2a <- Answer ~ 0 + (Training / Veracity) %in% LieType + 
  (0 + LieType / Veracity | Participant) + 
  (1 | Stimulus)
m2a <- brm(
  f2a,  
  family = bernoulli(link = probit),
  data = d,
  file = "models/m2a"
)
Code
Parameter Median CI CI_low CI_high pd Rhat ESS
b_TrainingNone:LieTypeAffective 0.148 0.95 -1.216 1.505 0.583 1.00 1666
b_TrainingBogus:LieTypeAffective 0.200 0.95 -1.202 1.564 0.614 1.00 1691
b_TrainingEmotion:LieTypeAffective 0.303 0.95 -1.112 1.653 0.671 1.00 1684
b_TrainingNone:LieTypeExperiential 0.273 0.95 -0.053 0.612 0.950 1.01 933
b_TrainingBogus:LieTypeExperiential 0.298 0.95 -0.014 0.632 0.969 1.01 905
b_TrainingEmotion:LieTypeExperiential 0.139 0.95 -0.187 0.469 0.800 1.01 895
b_TrainingNone:LieTypeAffective:Veracity1 -0.294 0.95 -0.522 -0.070 0.995 1.00 5535
b_TrainingBogus:LieTypeAffective:Veracity1 -0.189 0.95 -0.393 0.017 0.966 1.00 5909
b_TrainingEmotion:LieTypeAffective:Veracity1 -0.106 0.95 -0.316 0.096 0.854 1.00 5600
b_TrainingNone:LieTypeExperiential:Veracity1 0.097 0.95 -0.606 0.800 0.618 1.01 814
b_TrainingBogus:LieTypeExperiential:Veracity1 0.405 0.95 -0.258 1.083 0.888 1.01 730
b_TrainingEmotion:LieTypeExperiential:Veracity1 0.303 0.95 -0.363 0.987 0.822 1.01 821

Conclusion

TBD

Further reading

The GLMM analysis of SDT models in R is based on Vuorre (2017), which in turn drew on discussions in Rouder and Lu (2005), Rouder et al. (2007), DeCarlo (1998), DeCarlo (2010), and Decarlo (2003). Macmillan and Creelman (2005) is a classic introductory text on SDT, and Stanislaw and Todorov (1999) discusses the calculations required for SDT metrics in some detail. Bürkner (2017) is a good introductory text on the brms package; Wickham and Grolemund (2016) is an excellent book on R; and McElreath (2016) is a recommended introductory textbook on bayesian statistics and probabilistic modelling.

References

Bürkner, Paul-Christian. 2017. “Brms: An R Package for Bayesian Multilevel Models Using Stan.” Journal of Statistical Software 80 (1): 1–28. https://doi.org/10.18637/jss.v080.i01.
DeCarlo, Lawrence T. 1998. “Signal Detection Theory and Generalized Linear Models.” Psychological Methods 3 (2): 186–205. https://doi.org/10.1037/1082-989X.3.2.186.
Decarlo, Lawrence T. 2003. “Using the PLUM Procedure of SPSS to Fit Unequal Variance and Generalized Signal Detection Models.” Behavior Research Methods, Instruments, & Computers 35 (1): 49–56. https://doi.org/10.3758/BF03195496.
DeCarlo, Lawrence T. 2010. “On the Statistical and Theoretical Basis of Signal Detection Theory and Extensions: Unequal Variance, Random Coefficient, and Mixture Models.” Journal of Mathematical Psychology 54 (3): 304–13. https://doi.org/10.1016/j.jmp.2010.01.001.
Macmillan, Neil A., and C. Douglas Creelman. 2005. Detection Theory: A User’s Guide. 2nd ed. Mahwah, N.J: Lawrence Erlbaum Associates.
McElreath, Richard. 2016. Statistical Rethinking: A Bayesian Course with Examples in r and Stan. CRC Press.
Rouder, Jeffrey N., and Jun Lu. 2005. “An Introduction to Bayesian Hierarchical Models with an Application in the Theory of Signal Detection.” Psychonomic Bulletin & Review 12 (4): 573–604. https://doi.org/10.3758/BF03196750.
Rouder, Jeffrey N., Jun Lu, Dongchu Sun, Paul Speckman, Richard D. Morey, and Moshe Naveh-Benjamin. 2007. “Signal Detection Models with Random Participant and Item Effects.” Psychometrika 72 (4): 621–42. https://doi.org/10.1007/s11336-005-1350-6.
Stanislaw, Harold, and Natasha Todorov. 1999. “Calculation of Signal Detection Theory Measures.” Behavior Research Methods, Instruments, & Computers 31 (1): 137–49. http://link.springer.com/article/10.3758/BF03207704.
Vuorre, Matti. 2017. “Bayesian Estimation of Signal Detection Models.” October 9, 2017. https://vuorre.netlify.app/posts/2017-10-09-bayesian-estimation-of-signal-detection-theory-models.
Wickham, Hadley, and Garrett Grolemund. 2016. R for Data Science. http://r4ds.had.co.nz/.

Appendix

Simpler Model 0

This is just to show that the estimated values reflect manually calculated closely when we drop stimulus random effects

Code
f0a2 <- Answer ~ 1 + Veracity + (1 + Veracity | Participant)
m0a2 <- brm(
  formula = f0a2,  
  family = bernoulli(link = probit),
  data = d,
  file = "models/m0a2"
)
f0a3 <- Answer ~ 0 + Veracity + (0 + Veracity | Participant)
m0a3 <- brm(
  formula = f0a3,  
  family = bernoulli(link = probit),
  data = d,
  file = "models/m0a3"
)
parameters(m0a2, centrality = "mean")
Parameter Mean CI CI_low CI_high pd Rhat ESS
b_Intercept 0.218 0.95 0.170 0.267 1.000 1 3504
b_Veracity1 0.030 0.95 -0.066 0.125 0.737 1 4315
Code
as_draws_df(m0a3, variable = "b_", regex = TRUE) %>% 
  mutate(
    dprime = b_VeracityTrue - b_VeracityFalse,
    crit = -0.5 * (b_VeracityTrue + b_VeracityFalse)
  ) %>% 
  parameters(centrality = "mean")
Parameter Mean CI_low CI_high pd
b_VeracityFalse 0.204 0.134 0.274 1.000
b_VeracityTrue 0.233 0.167 0.300 1.000
dprime 0.029 -0.063 0.123 0.716
crit -0.218 -0.266 -0.171 1.000
Code
sdt %>% 
  select(Participant, zfa, zhr, dprime, crit) %>% 
  pivot_longer(-Participant) %>% 
  summarise(t = parameters(t.test(value)), .by = name)
name t
zfa value
zhr value
dprime value
crit value
Code
f0a4 <- bf(
  Answer ~ Phi(dprime * Veracity - criterion),
  dprime ~ 1 + (1 |s| Participant), 
  criterion ~ 1 + (1 |s| Participant),
  nl = TRUE
)
p0a4 <- prior(normal(0, 1), nlpar = "dprime") +
  prior(normal(0, 1), nlpar = "criterion") +
  prior(normal(0, 1), nlpar = "dprime", class = "sd") +
  prior(normal(0, 1), nlpar = "criterion", class = "sd")
m0a4 <- brm(
  f0a4,  
  family = bernoulli(link = "identity"),
  data = d,
  prior = p0a4,
  file = "models/m0a4"
)
parameters(m0a3, centrality = "mean")
Parameter Mean CI CI_low CI_high pd Rhat ESS
b_VeracityFalse 0.204 0.95 0.134 0.274 1 1 3889
b_VeracityTrue 0.233 0.95 0.167 0.300 1 1 4984

Receiving Operator Characteristic

People sometimes like talking about the receiver operating characteristic and area under the curve measures. At least they allow for nice plots (but those require some hairy code).

The ROC plots hit rates against false alarm rates for varying threshold levels. However, we only get one threshold (per participant):

Code
sdt %>% 
  ggplot(aes(far, hr)) +
  geom_abline(lty = 2, linewidth = .33) +
  scale_x_continuous(
    "False alarm rate",
    limits = c(0, 1),
    expand = expansion(0.01)
  ) +
  scale_y_continuous(
    "Hit rate",
    limits = c(0, 1),
    expand = expansion(0.01)
  ) +
  geom_smooth(
    method = "lm",
    linewidth = .33,
    color = "black",
    alpha = .2
  ) +
  geom_point(
    shape = 1,
    position = position_jitter(.01, .01)
  ) +
  stat_centroid(
    .fun = mean, col = "red",
    geom = "point", size = 2
  ) +
  theme(aspect.ratio = 1)

One solution to this is to plot the estimated hit rate for all false alarm rates. We do so here for the average participant. We display 100 random ROCs from the posterior to display uncertainty.

Code
grid <- tibble(
  x = seq(-2.5, 2.5, length.out = 30),
  far = pnorm(x, lower = FALSE),
  zfar = qnorm(far)
)
rocs <- as_draws_df(m0a) %>% 
  slice_sample(n = 100) %>% 
  select(starts_with("b_")) %>% 
  rownames_to_column("i") %>% 
  crossing(grid) %>% 
  mutate(hr = pnorm(x, b_VeracityTrue - b_VeracityFalse, lower = FALSE))
Code
rocs %>% 
  ggplot(aes(far, hr)) +
  geom_abline(lty = 2, linewidth = .33) +
  scale_x_continuous(
    "False alarm rate",
    limits = c(0, 1),
    expand = expansion(0.01)
  ) +
  scale_y_continuous(
    "Hit rate",
    limits = c(0, 1),
    expand = expansion(0.01)
  ) +
  geom_line(
    aes(group = i), 
    alpha = .25,
    linewidth = .33
  ) +
  theme(
    aspect.ratio = 1
  )

Reuse

Citation

BibTeX citation:
@online{vuorre2023,
  author = {Vuorre, Matti and Zloteanu, Mircea},
  title = {Analyzing {Deception} {Data} with {Hierarchical} {Bayesian}
    {Signal} {Detection} {Theoretic} {Models} {(Computational} Notebook
    Supplement to {TBD)}},
  date = {2023-04-21},
  langid = {en}
}
For attribution, please cite this work as:
Vuorre, Matti, and Mircea Zloteanu. 2023. “Analyzing Deception Data with Hierarchical Bayesian Signal Detection Theoretic Models (Computational Notebook Supplement to TBD).” April 21, 2023.